fix(docker): add HEALTHCHECK instruction to Dockerfile by dknos · Pull Request #1485 · NVIDIA/NemoClaw

dknos · 2026-04-05T00:32:36Z

Summary

Adds HEALTHCHECK instruction to Dockerfile for container health monitoring
Enables Docker/Compose to detect and restart unhealthy containers in standalone deployments
Adds hadolint ignore comments for pre-existing warnings (DL3006, SC2015)

Test plan

Build Docker image and verify HEALTHCHECK runs
Confirm container reports healthy status via docker inspect

Fixes #1430

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Container health monitoring enabled: periodic runtime and gateway probes now surface unhealthy states automatically, improving deployment reliability and observability and allowing orchestrators to detect and react to failed containers sooner.

Enables Docker and orchestrators to detect unhealthy containers and trigger automatic restarts. Also adds hadolint ignore comments for pre-existing warnings (DL3006, SC2015). Fixes NVIDIA#1430 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-05T00:32:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1368591e-9827-4200-97f6-719927cab281

📥 Commits

Reviewing files that changed from the base of the PR and between 0a43ca7 and 0e2fb12.

📒 Files selected for processing (1)

test/Dockerfile.sandbox

📝 Walkthrough

Walkthrough

Added HEALTHCHECK instructions to three Dockerfiles: runtime image probes the gateway at http://127.0.0.1:18789/health; base and sandbox images run a minimal Node.js probe. Also added two hadolint suppression comments. No other build or runtime steps were changed. (34 words)

Changes

Cohort / File(s)	Summary
Runtime Dockerfile `Dockerfile`	Added `HEALTHCHECK` using `curl -sf http://127.0.0.1:18789/health` with interval/timeout/start-period/retries. Inserted hadolint suppression comment (`DL3006`). No other build/runtime changes.
Base image Dockerfile `Dockerfile.base`	Added `HEALTHCHECK` running `node -e "process.exit(0)"` with interval/timeout/start-period/retries. No other changes.
Test sandbox Dockerfile `test/Dockerfile.sandbox`	Added `HEALTHCHECK` running `node -e "process.exit(0)"` with interval/timeout/start-period/retries (overrides inherited checks). Inserted hadolint suppression (`SC2015`). No other changes.

Sequence Diagram(s)

sequenceDiagram
    participant DockerEngine as Docker Engine
    participant Container as App Container
    participant HealthProbe as HEALTHCHECK Probe

    DockerEngine->>Container: start container
    DockerEngine-)HealthProbe: schedule probe (interval/retries)
    HealthProbe->>Container: execute probe (HTTP curl or node)
    alt probe success
        Container-->>HealthProbe: 2xx / exit 0
        HealthProbe-->>DockerEngine: report healthy
    else probe failure
        Container-->>HealthProbe: non-2xx / non-zero exit
        HealthProbe-->>DockerEngine: report unhealthy
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hop inside the container bright,
I tap the port and listen tight,
A curl, a blink, a tiny nod,
The node agrees — exit code odd? nod not!
Now Docker hums, my healthchecks light. 🥕✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(docker): add HEALTHCHECK instruction to Dockerfile' clearly summarizes the main change—adding a HEALTHCHECK instruction to Docker configuration files for health monitoring.
Linked Issues check	✅ Passed	The PR implements all primary objectives from issue `#1430`: adds HEALTHCHECK to Dockerfile and Dockerfile.base to enable container health detection in standalone Docker deployments, and also adds one to test/Dockerfile.sandbox.
Out of Scope Changes check	✅ Passed	All changes are within scope: HEALTHCHECK directives were added to three Dockerfile variants with appropriate health probe configurations, and hadolint suppressions for pre-existing warnings were included as documented.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

Dockerfile (1)

177-178: Consider adding a startup grace period to reduce cold-start false negatives.

A short --start-period typically improves reliability when services take time to bootstrap.

Proposed tweak

-HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
-    CMD curl -sf http://127.0.0.1:18789/health || exit 1
+HEALTHCHECK --interval=30s --timeout=5s --start-period=45s --retries=3 \
+    CMD curl -fsS http://127.0.0.1:18789/health || exit 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 177 - 178, The Dockerfile's HEALTHCHECK currently
lacks a startup grace period, causing cold-starts to be reported as failures;
update the HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s
--timeout=5s --retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1"
to include a suitable --start-period (for example --start-period=60s or another
value appropriate for your service) so Docker will wait before beginning health
probes, keeping the rest of the flags intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 174-179: The Dockerfile change added a HEALTHCHECK block
(HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf
http://127.0.0.1:18789/health || exit 1) but the same directive must also be
added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK
directive (same interval, timeout, retries and CMD curl -sf
http://127.0.0.1:18789/health || exit 1) at the analogous location where health
checks are defined in Dockerfile (near the end or after the service start/EXPOSE
steps) so both images include the same container health probe.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 177-178: The Dockerfile's HEALTHCHECK currently lacks a startup
grace period, causing cold-starts to be reported as failures; update the
HEALTHCHECK directive that uses "HEALTHCHECK --interval=30s --timeout=5s
--retries=3 CMD curl -sf http://127.0.0.1:18789/health || exit 1" to include a
suitable --start-period (for example --start-period=60s or another value
appropriate for your service) so Docker will wait before beginning health
probes, keeping the rest of the flags intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 315a1d49-e077-4507-a236-379162a409cd

📥 Commits

Reviewing files that changed from the base of the PR and between c99e3e8 and 27f1319.

📒 Files selected for processing (1)

Dockerfile

coderabbitai · 2026-04-05T00:37:15Z

Dockerfile

+# Health check: poll the gateway's /health endpoint so Docker (and Compose)
+# can detect and restart unhealthy containers in standalone deployments.
+# Ref: https://github.com/NVIDIA/NemoClaw/issues/1430
+HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
+    CMD curl -sf http://127.0.0.1:18789/health || exit 1
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Dockerfiles present:" fd -i '^Dockerfile(\.base)?$' | sort echo echo "HEALTHCHECK declarations by file:" fd -i '^Dockerfile(\.base)?$' --exec rg -n '^\s*HEALTHCHECK\b' {}

Repository: NVIDIA/NemoClaw

Length of output: 199

Add HEALTHCHECK to Dockerfile.base as well.

Issue #1430 requires HEALTHCHECK in both Dockerfile and Dockerfile.base. The change currently only addresses Dockerfile. Add the same HEALTHCHECK directive to Dockerfile.base at the appropriate location.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 174 - 179, The Dockerfile change added a HEALTHCHECK block (HEALTHCHECK --interval=30s --timeout=5s --retries=3 \ CMD curl -sf http://127.0.0.1:18789/health || exit 1) but the same directive must also be added to Dockerfile.base; open Dockerfile.base and add an identical HEALTHCHECK directive (same interval, timeout, retries and CMD curl -sf http://127.0.0.1:18789/health || exit 1) at the analogous location where health checks are defined in Dockerfile (near the end or after the service start/EXPOSE steps) so both images include the same container health probe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

Dockerfile.base (1)
128-129: Derived test image inherits insufficient base HEALTHCHECK

The base HEALTHCHECK (lines 128-129) only validates that node can execute, not that the actual service is running. The verification confirms that test/Dockerfile.sandbox (line 8 of provided context) inherits from ${BASE_IMAGE} but does not define its own HEALTHCHECK override, leaving it vulnerable to false-positive health reports.

To mitigate this, either require all child images to define a service-level HEALTHCHECK probe, or parameterize the base command so inheritors can inject a meaningful healthcheck (e.g., a startup script or port check).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile.base` around lines 128 - 129, The base HEALTHCHECK is too
superficial (only running node) and causes child images like
test/Dockerfile.sandbox to inherit a false-positive probe; add a build-time
argument (e.g., ARG HEALTHCHECK_CMD) and use it in the HEALTHCHECK instruction
so inheritors can supply a meaningful check, or alternatively remove the base
HEALTHCHECK and document that every child image must define its own HEALTHCHECK;
update the HEALTHCHECK line to reference HEALTHCHECK_CMD and ensure
test/Dockerfile.sandbox sets HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so
the probe actually validates the service.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Dockerfile.base`:
- Around line 128-129: The base HEALTHCHECK is too superficial (only running
node) and causes child images like test/Dockerfile.sandbox to inherit a
false-positive probe; add a build-time argument (e.g., ARG HEALTHCHECK_CMD) and
use it in the HEALTHCHECK instruction so inheritors can supply a meaningful
check, or alternatively remove the base HEALTHCHECK and document that every
child image must define its own HEALTHCHECK; update the HEALTHCHECK line to
reference HEALTHCHECK_CMD and ensure test/Dockerfile.sandbox sets
HEALTHCHECK_CMD (or defines its own HEALTHCHECK) so the probe actually validates
the service.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ee49aa74-0422-41e5-815b-0ce53bf2d577

📥 Commits

Reviewing files that changed from the base of the PR and between 27f1319 and 8d48ef6.

📒 Files selected for processing (2)

Dockerfile
Dockerfile.base

🚧 Files skipped from review as they are similar to previous changes (1)

Dockerfile

…est image Address CodeRabbit feedback: expand the Dockerfile.base HEALTHCHECK comment to explicitly document that child images exposing services MUST override the baseline probe, and add an explicit HEALTHCHECK to test/Dockerfile.sandbox so it does not inherit a potentially service-oriented check that would produce false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

test/Dockerfile.sandbox (1)

89-92: Remove the redundant || exit 1 suffix.

The node -e "process.exit(0)" command already returns non-zero on failure, which Docker HEALTHCHECK treats as unhealthy. The || exit 1 adds no functional value.

♻️ Suggested simplification

 # Test image: no long-running service, so just verify the runtime works.
 # Overrides any inherited HEALTHCHECK to avoid false-positive probes.
 HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
-    CMD node -e "process.exit(0)" || exit 1
+    CMD node -e "process.exit(0)"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/Dockerfile.sandbox` around lines 89 - 92, The HEALTHCHECK line includes
a redundant shell suffix "|| exit 1" after the command `node -e
"process.exit(0)"`; remove the `|| exit 1` so the HEALTHCHECK simply runs `node
-e "process.exit(0)"` (Docker already treats a non-zero exit from that command
as unhealthy). Update the HEALTHCHECK invocation (the line containing
HEALTHCHECK --interval=30s ... CMD node -e "process.exit(0)" || exit 1) to drop
the `|| exit 1` fragment.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/Dockerfile.sandbox`:
- Around line 89-92: The HEALTHCHECK line includes a redundant shell suffix "||
exit 1" after the command `node -e "process.exit(0)"`; remove the `|| exit 1` so
the HEALTHCHECK simply runs `node -e "process.exit(0)"` (Docker already treats a
non-zero exit from that command as unhealthy). Update the HEALTHCHECK invocation
(the line containing HEALTHCHECK --interval=30s ... CMD node -e
"process.exit(0)" || exit 1) to drop the `|| exit 1` fragment.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 76cfeeb0-cab5-4ba0-b302-e04a4fbc653f

📥 Commits

Reviewing files that changed from the base of the PR and between 8d48ef6 and 0a43ca7.

📒 Files selected for processing (2)

Dockerfile.base
test/Dockerfile.sandbox

✅ Files skipped from review due to trivial changes (1)

Dockerfile.base

wscurran · 2026-04-06T15:07:57Z

✨ Thanks for submitting this fix, which proposes a way to add a HEALTHCHECK instruction to the Dockerfile. This enables better container health monitoring and improves reliability in standalone deployments.

Possibly related open issues:

#1430 No Dockerfile HEALTHCHECK — Unhealthy Containers Not Detected in Standalone Docker Deployments

node -e "process.exit(0)" already returns non-zero on failure; the || exit 1 suffix adds no functional value and was flagged by CodeRabbit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dknos

Addressed: removed the redundant || exit 1 suffix from the HEALTHCHECK in test/Dockerfile.sandbox — node -e "process.exit(0)" already returns non-zero on failure, so the extra suffix was dead code. The base image (Dockerfile.base) and main Dockerfile HEALTHCHECK instructions were added in earlier commits on this branch and remain unchanged.

coderabbitai bot reviewed Apr 5, 2026

View reviewed changes

fix(docker): add HEALTHCHECK to Dockerfile.base and start-period grace

8d48ef6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Apr 5, 2026

View reviewed changes

coderabbitai bot reviewed Apr 6, 2026

View reviewed changes

wscurran added observability Use this label to improve NemoClaw logging, metrics, and tracing. Docker Support for Docker containerization fix labels Apr 6, 2026

fix(docker): remove redundant || exit 1 from sandbox HEALTHCHECK

0e2fb12

node -e "process.exit(0)" already returns non-zero on failure; the || exit 1 suffix adds no functional value and was flagged by CodeRabbit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dknos commented Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): add HEALTHCHECK instruction to Dockerfile#1485

fix(docker): add HEALTHCHECK instruction to Dockerfile#1485
dknos wants to merge 4 commits intoNVIDIA:mainfrom
dknos:fix/dockerfile-healthcheck-1430

dknos commented Apr 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 5, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

wscurran commented Apr 6, 2026

Uh oh!

dknos left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dknos commented Apr 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

wscurran commented Apr 6, 2026

Uh oh!

dknos left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dknos commented Apr 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 5, 2026 •

edited

Loading